Interpolation

Very often we have measured a series of physical quantities $y_{i}$ that have a dependendence on some variables $x_{i}$ and want to know the expected values of $y$ for values of $x$ that we didn't measure.

If the values of $x$ are inside the range of $x_{i}$ we call this procedure interpolation. If they are beyond we call it extrapolation.

The process of interpolation is usually done in two steps:

Finding a analytical function that goes over the provided points
Evaluating that function at the desired point $x$

A proper interpolation procedure in physics should actually create a model that provides the analytical function over which we expect that all the points will lie, including the desired points we haven't measured. However, such physical insight is not always available. That's why we will define a mathematical framework for the interpolation, forgetting (gasp!) that we want to apply this techniques in a physical setting.

First steps - 1D interpolation

We will explore different fitting routines and methods availably in scipy.



In [2]:

    
from scipy.interpolate import interp1d
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [3]:

    
#generate data for a known funcion
x = linspace(0.0,10.0,21)
y = sin(x**1.5 + 2.0)
x_true = linspace(0.0,10.0,1000)
y_true = sin(x_true**1.5 + 2.0)



In [4]:

    
#first visualization
plot(x_true, y_true)
scatter(x,y)
legend(['true','data'])

#make a nice plot
ax = axes()
ax.set_xlabel("$x$",fontsize=20)
ax.set_ylabel("$f(x)$",fontsize=20)
ax.set_title("$\mathrm{data\ and\ function}$", fontsize=20)
filename = 'figure1'
savefig(filename + '.pdf',format = 'pdf', transparent=True)

In the linear interpolation a line is drawn between two adjacednt points $(x_0, y_0)$ and $(x_1,y_1)$.

In that case the interpolating function can be defined from

$$ \frac{y-y_0}{x-x_0} = \frac{y_1-y_0}{x_1-x_0} $$

solving for the value of $y$

$$ y = y_0 + (x-x_0)\frac{y_1-y_0}{x_1-x_0} $$

Generalizing this logic, one could suggest that for $N$ points in the the plane $(x_i,y_i)$ there will be a unique polinomial of degree less $N-1$ that passes through all the points, $P(x_i)=y_i$, $i=1,\ldots,N$.

The polynomial $P$ can be written as:

$$ P(x) = \sum_{i=1}^N\left(\prod_{j=1,j\neq i}^N \frac{x - x_{j}}{x_{i}-x_{j}}\right) y_{i} $$

or in a equivalent way

$$ P(x) = \sum_{i=1}^N P_{i} (x) $$

where $$ P_i (x) = y_i \prod_{j=1, j\neq i}^{N} \frac{x-x_{j}}{x_i - x_j} $$

Which is known as the lagrangian form of the polynomial.

In the case of $N=2$

$$ y = y_0 \frac{x-x_1}{x_0-x_1} + y_1\frac{x-x_0}{x_1-x_0} $$



In [5]:

    
x_in = array([0,1])
y_in = array([1,4])
def poli(x, x_in, y_in):
    value = y_in[0]*((x-x_in[1])/(x_in[0]-x_in[1])) + y_in[1]*((x-x_in[0])/(x_in[1]-x_in[0]))
    return value



In [6]:

    
x_l = linspace(0.0,3.0,100)
y_l = poli(x_l, x_in, y_in)
plot(x_l,y_l)
scatter(x_in, y_in)









    Out[6]:





<matplotlib.collections.PathCollection at 0x5574a50>

In the case of $N=3$



In [7]:

    
x_in = array([0,1,5])
y_in = array([1,4,-2])
def poli(x, x_in, y_in):
    p0 = y_in[0]*((x-x_in[1])/(x_in[0]-x_in[1]))*((x-x_in[2])/(x_in[0]-x_in[2]))
    p1 = y_in[1]*((x-x_in[0])/(x_in[1]-x_in[0]))*((x-x_in[2])/(x_in[1]-x_in[2]))
    p2 = y_in[2]*((x-x_in[0])/(x_in[2]-x_in[0]))*((x-x_in[1])/(x_in[2]-x_in[1]))
    value = p0 + p1 + p2
    return value



In [9]:

    
x_l = linspace(0.0,5.0,100)
y_l = poli(x_l, x_in, y_in)
print len(y_l)
plot(x_l,y_l)
scatter(x_in, y_in)









    



100






    Out[9]:





<matplotlib.collections.PathCollection at 0x573b210>



In [8]:

    
#defines a linear interpolation
f = interp1d(x, y)



In [9]:

    
#other kinds of interpolation are 'linear','nearest', 'zero', 'slinear', 'quadratic'
scatter(x,y)
plot(x_true, y_true)
plot(x_true, f(x_true))
legend(['true', 'linear', 'data'])









    Out[9]:





<matplotlib.legend.Legend at 0x433f550>

The problem with the linear interpolations is that the derivatives clearly jump from one side to the other of the data points. In the case of using all the points to define a polynomial the problem is that the values of the function can be too off from the true function.

A way to solve this problem is the method of splines. A spline is a piecewise polynomial, each piece joining to points. Depending on the kind of spline one is doing, one is looking for different characteristics for this set of polinomial.

For instance, in the case of a cubic spline one is looking for segments of polynomials that will be smooth in the first derivative and continuous in the second derivative.

A Practical Guide to Splines (revised edition) by Carl de Boor is one of the best references if you really want to dig into the subject.

Example

Wikipedia gives us a nice example of a spline in three pieces that has a bell shape.



In [10]:

    
x1 = linspace(-2.0, -1.0,100)
x2 = linspace(-1.0,1.0,100)
x3 = linspace(1.0,2.0,100)

y1 = 0.25*(x1+ 2.0)**3.0
y2 = 0.25*(3*abs(x2)**3.0 -6.0*x2**2 + 4.0)
y3 = 0.25*(2.0-x3)**3.0
plot(x1,y1, color='red')
plot(x2,y2, color='blue')
plot(x3,y3, color='green')









    Out[10]:





[<matplotlib.lines.Line2D at 0x41ba3d0>]



In [11]:

    
# this is the result of a cubic spline
f2 = interp1d(x, y, kind='cubic')
scatter(x,y)
plot(x_true, y_true)
plot(x_true, f2(x_true))
legend(['true', 'cubic', 'data'])









    Out[11]:





<matplotlib.legend.Legend at 0x4fc5c50>

2D interpolation



In [12]:

    
# I create a 2D function
def my_func(x, y):
    return exp(-((x-0.5)**2)/0.1) * sin(20.0*(y**2))



In [13]:

    
#that I only now on 500 points
points = np.random.rand(500, 2)
values = my_func(points[:,0], points[:,1])



In [14]:

    
#here i define a dense grid in x-y with ranges [0,1], [0,1] to see the complete funcion
grid_x, grid_y = mgrid[0:1:200j, 0:1:200j]



In [15]:

    
imshow(my_func(grid_x, grid_y).T, extent=(0,1,0,1), origin='lower')
plot(points[:,0], points[:,1], 'k.', ms=2, alpha = 0.8)

ax = axes()
ax.set_xlabel("$x$",fontsize=20)
ax.set_ylabel("$y$",fontsize=20)
ax.set_title("$\mathrm{data\ and\ function}$", fontsize=20)









    Out[15]:





<matplotlib.text.Text at 0x4fa5f10>



In [16]:

    
from scipy.interpolate import griddata
grid_zA = griddata(points, values, (grid_x, grid_y), method='nearest')
grid_zB = griddata(points, values, (grid_x, grid_y), method='linear')
grid_zC = griddata(points, values, (grid_x, grid_y), method='cubic')



In [17]:

    
imshow(grid_zA.T, extent=(0,1,0,1), origin='lower')
plot(points[:,0], points[:,1], 'k.', ms=2, alpha = 0.8)

ax = axes()
ax.set_xlabel("$x$",fontsize=20)
ax.set_ylabel("$y$",fontsize=20)
ax.set_title("$\mathrm{data\ and\ nearest\ interpolation}$", fontsize=20)









    Out[17]:





<matplotlib.text.Text at 0x3edaa10>



In [18]:

    
imshow(grid_zB.T, extent=(0,1,0,1), origin='lower')
plot(points[:,0], points[:,1], 'k.', ms=2, alpha = 0.8)

ax = axes()
ax.set_xlabel("$x$",fontsize=20)
ax.set_ylabel("$y$",fontsize=20)
ax.set_title("$\mathrm{data\ and\ linear\ interpolation}$", fontsize=20)









    Out[18]:





<matplotlib.text.Text at 0x3efa450>



In [19]:

    
imshow(grid_zC.T, extent=(0,1,0,1), origin='lower')
plot(points[:,0], points[:,1], 'k.', ms=2, alpha = 0.8)

ax = axes()
ax.set_xlabel("$x$",fontsize=20)
ax.set_ylabel("$y$",fontsize=20)
ax.set_title("$\mathrm{data\ and\ cubic\ interpolation}$", fontsize=20)









    Out[19]:





<matplotlib.text.Text at 0x53f50d0>

This brief outline is based on the the scipy documentation webpage: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

Exercise

Write a program in Python that given a set of N points in $x$ and $y$ returns the interpolating polinomial. Test it with the following arrays



In [20]:

    
x_in = array([1.0,2.0,4.0,5.0,6.0,7.0,8.0])
y_in = array([3.0,-2.0,2.0,4.0,1.0,2.0,3.0])



In [21]:

    
scatter(x_in, y_in)









    Out[21]:





<matplotlib.collections.PathCollection at 0x55d7410>

Find the interpolating polinomials for the three set of points taken from the following function



In [37]:

    
def p(x):
    return 1.0/(1.0+25.0*x**2)



In [38]:

    
a = linspace(-1.0,1.0,3)
print p(a)
b = linspace(-1.0,1.0,5)
print p(b)
c = linspace(-1.0,1.0,9)
print p(c)









    



[ 0.03846154  1.          0.03846154]
[ 0.03846154  0.13793103  1.          0.13793103  0.03846154]
[ 0.03846154  0.06639004  0.13793103  0.3902439   1.          0.3902439
  0.13793103  0.06639004  0.03846154]



In [49]:

    
scatter(a,p(a), color="black", alpha=1.0, s=30)
scatter(b,p(b), color="red", alpha=0.7,s=120)
scatter(c,p(c), color="green", alpha=0.4,s=500)









    Out[49]:





<matplotlib.collections.PathCollection at 0x7d7a450>